Cross Language Information Integration Bridging the Gap

نویسنده

  • Ramanathan Palaniappan
چکیده

Integrating information in multiple natural languages is a challenging task that often requires manually created linguistic resources such as a bilingual dictionary or examples of direct translations of text. Comparable coprora have important properties that can be exploited to infer word translations without a bilingual dictionary. The main premise underlying comparable corpora is that translations of two co-occurring words in a source language also co-occur in the target language. Past work has made use of this property for directly extracting a lexicon from comparable corpora. A major drawback of this work is the number of parameters (translation probabilities) to be estimated which increases quadratically with the vocabulary sizes. In this paper, we propose a novel cluster based approach that tries to map groups of related words, rather than the individual words themselves. This method has the advantage that the number of parameters to be estimated remains indepdendent of the vocabulary size. Experiments show that the computational demands of this problem are really very high and the method seems to work well for small data sets. Approximating the multinomials produced by the clustering algorithm (LDA) to speed up computation degrades the performance of the model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross border E-Science and Research Partnership: Bridging the Gap Between Science and Media

  E-Science is a tool that helps scientists to store, interpret, analyze and make a network of their data, and it can play a critical role in different aspects of the scientific goals and research. This commentary, under the topic of Cross Border E-Science and Research Partnership: Bridging the Gap between Science and Media,[1] attempts to shed light on E-Science with emphasis on three importa...

متن کامل

The Impact of Skill Integration on Task Involvement Load

The present study investigated whether word learning and retention in a second language are contingent upon a task's involvement load, i.e., the amount of need, search, and evaluation the task imposes. Laufer and Hulstijn (2001) contend that tasks with higher degrees of these three components induce higher involvement load, and are, therefore, more effective for word learning. To test this clai...

متن کامل

Cross Language Information Retrieval: an Experiment in Bilingual News Article Alignment from the Internet using MT

Cross Language Information Retrieval (CLIR) o ers the potential for users to search document collections in foreign languages. This is particularly relevant now that the Internet has become a global information source. Machine translation (MT) has a key role in bridging the gap between the language of the users' query and that of the document collection as well as to help the user understand th...

متن کامل

Increasing the Effectiveness of Russian Language Teaching for Special Purposes (to the Problem of Integration of Language Training with Information Technology Courses)

The article is devoted to the problem of increasing the efficiency of language teaching for the special purposes of foreign students in studying Russian at a technical university. Particular attention is paid to the training of foreign students in the skills of working with information using the latest computer technology. The conclusions of the work are based on the analysis of the results of ...

متن کامل

Scholarship and practice: the contribution of ethnographic research methods to bridging the gap

Introduction Research methods are the means by which knowledge is acquired and constructed within a discipline. Research methods need to be both relevant and rigorous in order to be accepted as legitimate within a particular field of knowledge. Information systems (IS) is a field which has multiple stakeholders in its knowledge development, operating in contexts which have to deal with multipli...

متن کامل

Image-Language Association: are we looking at the right features?

The ever growing popularity and availability of multimedia information has rendered automatic image-language association essential in a number of multimedia integration applications. Bridging the gap between the two media requires an appropriate feature-set for describing their common reference; one that will be both distinctive of the entities referred too and feasible to extract automatically...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006